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Introduction (1) 

Text and speech processing: hard problems 

• Theory of automata 

• Appropriate level of abstraction 

• Well-defined algorithmic problems 
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Introduction (2) 

Three Sections: 

• Algorithms for text and speech processing (2h) 

• Speech recognition (2h) 

• Finite-state methods for language processing (2h) 
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Definitions: finite automata (1) 

A = (E, Q, 6, 1, F) 

• Alphabet L, 

• Finite set of states Q, 

• Transition function S: Q x £ ^ 2^, 

• / C Q set of initial states, 

• F C Q set of final states. 

A recognizes L{A) = g : S{I, ^) n F / 0} 
(Hopcroft and Ullman, 1979; Perrin, 1990) 

Theorem 1 (Kleene, 1965). A set is regular (or rational) iff it can be 
recognized by a finite automaton. 
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Definitions: finite automata (2) 



b 




Figure h L{A) = Z^'aha. 
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Definitions: weighted automata (1) 

A = {I.,Q,\8,a,p,I,F) 

• (X, Q, ^, /, F) is an automaton, 

• Initial output function A, 

• Output function a: Q x x Q ^ K, 

• Final output function p, 

• Function / : L* ^ (iC, + , •) associated with A: 

Mu e Dom{f), f{u) = ^ (A(i) • cr(i, ii, ^) • p{q)). 

(i,q)elx(6(i,u)nF) 
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Definitions: weighted automata (2) 
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Definitions: rational power series 

• Power series: functions mapping E* to a semiring (A^, + , •) 

- Notation: .S* = (S,w)w, (S,w): coefficients 

- Support: supp(S) = {w el^* : (S, w) / 0} 

- Sum: {S -\-T,w) = (S, w) + (T, w) 

- Star: ^* = ^ 

n>0 

- Product: (ST,w) = ^ (^,ii)(T,i;) 

uv=w ^IL* 

• Rational power series: closure under rational operations of polynomials 
(polynomial power series) (Salomaa and Soittola, 1978; Berstel and 
Reutenauer, 1988) 

Theorem 2 (Schutzenberger, 1961 ). A power series is rational iff it can be 
represented by a weighted finite automaton. 
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Definitions: transducers (1) 

T=iZ,A,Q,6,a,I,F) 

• Finite alphabets Z and A, 

• Finite set of states Q, 

• Transition function S: Q x Z ^ 2^, 

• Output function criQxZxQ^Z*, 

• / C Q set of initial states, 

• F C Q set of final states. 
T defines a relation: 

R{T) = {{u,v)e{I.n''^ve U cr{I,u,q)} 

qe(6(i,u)nF) 
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Definitions: transducers (2) 
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Definitions: weighted transducers 

b:a/l 




Figure 4: Example, aaba {bbcb, (0 1 0) © (0 1 1 0)). 

(min, +) : aaba min{l, 2} = 1 
(+, •) : aaba ^0 + = 
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Composition: Motivation (1) 

• Construction of complex sets or functions from more elementary ones 

• Modular (modules, distinct linguistic descriptions) 

• On-the-fty expansion 
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Composition: Motivation (2) 

source program 

lexical 
analyzer 

T 

syntax 
analyzer 

T 



semantic 
analyzer 

intermediate code 
generator 

i 



code 
optimizer 

T 

code 
generator 

Figure 5: Phases of a Compiler (Aho et al, 1986). 

target program 
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Composition: Motivation (3) 

Source text 

i 

Spellchecker 
Inflected forms 

Index 

i 

Set of positions 



Figure 6: Complex indexation. 
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Composition: Example (1) 




Figure 7: Composition of transducers. 
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Composition: Example (2) 




Figure 8: Composition of weighted transducers (+, •). 
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Composition: Algorithm (1) 

Construction of pairs of states 

a:h/w\ . h'.cjwi . 

- Match: q\ — > q[ and q2 — > q2 

-Result: (,,,^2) ^^^/^-^^"'^^ 
Elimination of e-paths redundancy: filter 
Complexity: quadratic 
On-the-fty implementation 
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Composition: Algorithm (2) 



(a) 




(c) 



£!£l £!£i £:£i £:£i £!£i 



) .( 






2 


\ c:£2 . 


>r\ d:d ; 






] *\ 


^ >(j 



4 



(d) 

£2:£ £2:£ £2:^ £2:^ 




Figure 9: Composition of weighted transducers with e-transitions. 
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Composition: Algorithm (3) 



b:e 

(82:82) 




(8l:8l) 



d:a 

rx:x) 



Figure 10: Redundancy of e-paths. 
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Composition: Algorithm (4) 



£l:£l 




Figure 1 1 : Filter for efficient composition. 
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Composition: Theory 

• Transductions (Elgot and Mezei, 1965; Eilenberg, 1974 1976; 
Berstel, 1979). 

• Theorem 3 Let t\ and he two {weighted) {automata + 
transducers), then {t\ o tj) is a {weighted) {automaton + transducer). 

• Efficient composition of weighted transducers (Mohri, Pereira, and 
Riley, 1996). 

• Works with any semiring 

• Intersection: composition of automata (weighted). 
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Intersection: Example 



a 




Figure 12: Intersection of automata. 



M.Mohri-M.Riley-R.Sproat 



Algorithms for Speech Recognition and Language Processing 



PARTI 



23 




M.Mohri-M.Riley-R.Sproat 



Algorithms for Speech Recognition and Language Processing 



PARTI 



24 



Determinization: Motivation (1) 

• Efficiency of use (time) 

• Elimination of redundancy 

• No loss of information (/ pruning) 
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Determinization: Motivation (2) 




Figure 14: Toy language model (16 states, 53 transitions, 162 paths). 
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Determinization: Motivation (3) 



leave/64.6 



Detroit/103 







which/69.9 




Figure 15: Determinized language model (9 states, 1 1 transitions, 4 paths). 
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Determinization: Example (2) 




Figure 17: Determinization of weighted automata (min, +). 
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Determinization: Example (3) 




Figure 18: Determinization of transducers. 
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Determinization: Example (4) 



a:b/3 





Figure 19: Determinization of weighted transducers (min, +). 
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Determinization: Algorithm (1) 

• Generalization of the classical algorithm for automata 

- Powerset construction 

- Subsets made of (state, weight) or (state, string, weight) 

• Applies to subsequentiable weighted automata and transducers 

• Time and space complexity: exponential (polynomial w.r.t. size of 
the result) 

• On-the-fty implementation 
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Determinization: Algorithm (2) 

Conditions of applications 

• Twin states: q and q' are twin states iff: 

- If: they can be reached from the initial states by the same input 
string u 

- Then: cycles at q and q' with the same input string v have the 
same output value 

• Theorem 4 (Choffrut, 1978; Mohri, 1996a) Let r be an 
unambiguous weighted automaton ( transducer, weighted transducer), 
then T can be determinized iff it has the twin property. 

• Theorem 5 (Mohri, 1996a) The twin property can be tested in 
polynomial time. 
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Determinization: Theory 

• Determinization of automata 

- General case (Aho, Sethi, and UUman, 1986) 

- Specific case of failure functions (Mohri, 1995) 

• Determinization of transducers, weighted automata, and weighted 
transducers 

- General description, theory and analysis (Mohri, 1996a; Mohri, 
1996b) 

- Conditions of application and test algorithm 

- Acyclic weighted transducers or transducers admit 
determinization 

• Can be used with other semirings (ex: (7^, + , •)) 
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Local determinization: Motivation 

• Time efficiency 

• Reduction of redundancy 

• Control of the resulting size (flexibility) 

• Equivalent function (or equal set) 

• No loss of information 
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Local determinization: Example 



c:a/2 




Figure 20: Local determinization of weighted transducers (min, +). 
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Local determinization: Algorithm 

Predicate, ex: (P) (out — degree{q) > k) 
k: threshold parameter 
Local: Dom{det) = {q : P{q)} 
Determinization only for q G Dom{det) 
On-the-fty implementation 

Complexity 0{\Dom{det)\ • max(oiit — degree{q))) 
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Local determinization: theory 

• Various choices of predicate (constraint: local) 

• Definition of parameters 

• Applies to all automata, weighted automata, transducers, and 
weighted transducers 

• Can be used with other semirings (ex: (7^, + , •)) 
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Minimization: Motivation 

• Space efficiency 

• Equivalent function (or equal set) 

• No loss of information (/ pruning) 
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Minimization: Motivation (2) 
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Minimization: Motivation (3) 



leave/0.0498 




Figure 22: Minimized language model. 
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Minimization: Example (1) 
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Minimization: Example (2) 




Figure 24: Minimization of weighted automata (min, +). 
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Minimization: Example (3) 




Figure 25: Minimization of transducers. 
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Minimization: Example (4) 




Figure 26: Minimization of weighted transducers (min, +). 
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Minimization: Algorithm (1) 

• Two steps 

- Pushing or extraction of strings or weights towards initial state 

- Classical minimization of automata, (input,ouput) considered as a 
single label 

• Algorithm for the first step 

- Transducers: specific algorithm 

- Weighted automata: shortest-paths algorithms 
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Minimization: Algorithm (2) 

• Complexity 

- E: set of transitions 

- S: sum of the lengths of output strings 

- the longest of the longest common prefixes of the output paths 



leaving each state 



Type 


General 


Acyclic 


Automata 


0{\E -logdQD) 


o(igi+ E\) 


Weighted automata 


0(|i^|.log(|Q|)) 


0{\Q\+ E\) 


Transducers 


Oi\Q\ + \E\- 

(l0g|Q|+ Pr^a.D) 


0{S+\E\ + \Q\+ 
(\E\ - i\Q\ - \F\)) ■ 

1 Pmax 1 ) 
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Minimization: Theory 

• Minimization of automata (Aho, Hopcroft, and UUman, 1974; Revuz, 
1991) 

• Minimization of transducers (Mohri, 1994) 

• Minimization of weighted automata (Mohri, 1996a) 

- Minimal number of transitions 

- Test of equivalence 

• Standardization of power series (Schiitzenberger, 1961) 

- Works only with fields 

- Creates too many transitions 
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Conclusion (1) 

• Theory 

- Rational power series 

- Weighted automata and transducers 

• Algorithms 

- General (various semirings) 

- Efficiency (used in practice, large sizes) 
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Conclusion (2) 

• Applications 

- Text processing 

(spelling checkers, pattern-matching, indexation, OCR) 

- Language processing 

(morphology, phonology, syntax, language modeling) 

- Speech processing (speech recognition, text- to- speech synthesis) 

- Computational biology (matching with errors) 

- Many other applications 
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Overview 

• The speech recognition problem 

• Acoustic, lexical and grammatical models 

• Finite- state automata in speech recognition 

• Search in finite-state automata 
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Speech Recognition 

Given an utterance, find its most likely written transcription. 
Fundamental ideas: 

• Utterances are built from sequences of units 

• Acoustic correlates of a unit are affected by surrounding units 

• Units combine into units at a higher level — phones syllables 
words 

• Relationships between levels can be modeled by weighted graphs — 
we use weighted finite-state transducers 

• Recognition: find the best path in a suitable product graph 
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Levels of Speech Representation 



/milo/ul/juliaydemo/busweek/speech.d.l.fspec (S.F.: ^1.5) {left:up/down move iinid:modify intensity right:menu} 




recdgnizeT 



SPEECH 



■/m1lo/u1/julia/demo/bu£week/speech.lab T: 9,|133ee IHSERT 



/milo/ul/juliaydemo/busweek/beach.d.l.fspec (S.F.: ^1.5){left:up/down move mid:modi)y intensity right:menu} 
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Maximum A Posteriori Decoding 

Overall analysis [4, 57]: 

• Acoustic observations: parameter vectors derived by local spectral 
analysis of the speech waveform at regular (e.g. 10msec) intervals 

• Observation sequence o 

• Transcriptions w 

• Probability P(o|w) of observing o when w is uttered 

• Maximum a posteriori decoding: 

7-)/ \ P(o|w)P(w) 

w = argmax P(w o) = argmax ^ ^/x^ ^ 

w w 

= argmax P(o|w) 



w 



generative language 
model model 
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Generative Models of Speech 

Typical decomposition of P(o|w) into conditionally-independent 
mappings between levels: 

• Acoustic model P(o|p) : phone sequences observation sequences. 
Detailed model: 

- P{o\d) : distributions observation vectors — 
symbolic quantitative 

- P{d\m) : context-dependent phone models 
distribution sequences 

- P(m|p) : phone sequences model sequences 

• Pronunciation model P(p|w) : word sequences phone sequences 

• Language model P(w) : word sequences 
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Recognition Cascades: General Form 

Multistage cascade: 





stage k 


Sk-l Sl 


stage 1 







Find So maximizing 

P(so,Sfc) = P(sfc|so)P(so) = P(so) Yl n ^(s^Ni-i) 

Si,...,Sfc_i l<j<k 

"Viterbi" approximation: 

Cost(so, Sk) = Cost(sfc|so) + Cost(so) 
Cost(sfclso) « minsi,...,s,_i J2i<j<k Cost(s^- |s^-_i) 

where Cost(. . .) = — log P(. . .). 
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Speech Recognition Problems 

• Modeling: how to describe accurately the relations between levels ^ 
modeling errors 

• Search: how to find the best interpretation of the observations 
according to the given models ^ search errors 
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Acoustic Modeling - Feature Selection I 

• Short- time spectral analysis: 




J I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I L 



2000 4000 6000 

Short-time (25 msec. Hamming window) spectrum of /ae/ - Hz. vs. Db. 

• Scale selection: 

- Cepstral smoothing 

- Parameter sampling (13 parameters) 



M.Mohri-M.Riley-R.Sproat 



Algorithms for Speech Recognition and Language Processing 



PART II 



59 



Acoustic Modeling - Feature Selection II [40, 38] 

• Refinements 

- Time derivatives - 1st and 2nd order 

- non-Fourier analysis (e.g., Mel scale) 

- speaker/channel adaptation 

* mean cepstral subtraction 

* vocal tract normalization 

* linear transformations 

• Result: 39 dimensional feature vector (13 cepstra, 13 delta cepstra, 
13 delta-delta cepstra) every 10 milliseconds 
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Acoustic Modeling - Stochastic Distributions [4, 61, 39, 5] 

• Vector quantization - find codebook of prototypes 

• Full covariance multivariate Gaussians: 

(27r)^/2|S|V2'^ 

• Diagonal covariance Gaussian mixtures 

• Semi-continuous, tied mixtures 
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Acoustic Modeling - Units and Training [61, 36] 

• Units 

- Phonetic (sub-word) units - e.g., cat -> /k ae t/ 

- Context-dependent units - aek^t 

- Multiple distributions (states) per phone - left, middle, right 

• Training 

- Given a segmentation, training straight-forward 

- Obtain segmentation by transcription 

- Iterate until convergence 
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Generating Lexicons - Two Steps 

• Orthography Phonemes 

"had" ^ /hh ae d/ 
"your" ly uw r/ 

- complex, context-independent mapping 

- usually small number of alternatives 

- determined by spelling constraints; lexical "facts" 

- large online dictionaries available 

• Phonemes Phones 

/hh ae d y uw r/ [hh ae del jh axr] (60% prob) 
/hh ae d y uw r/ [hh ae del d y axr] (40% prob) 

- complex, context-dependent mapping 

- many possible alternatives 

- determined by phonological and phonetic constraints 
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Decision Trees: Overview [9] 



• Description/Use: Simple structure - binary tree of decisions, 
terminal nodes determine prediction (cf. "Game of Twenty 
Questions"). If dependent variable is categorical (e.g., red, 
yellow, green), called "classification tree", if continuous, called 
"regression tree". 

• Creation/Estimation: Creating a binary decision tree for 
classification or regression involves three steps (Breiman, et al): 

1. Splitting Rules: Which split to take at a node? 

2. Stopping Rules: When to declare a node terminal? 

3. Node Assignment: Which class/value to assign to a terminal node? 
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1. Decision Tree Splitting Rules 



Which spHt to take at a node? 



Candidate splits considered. 

- Binary cuts: For continuous — oo < x < oo, consider spHts of 
form: 

X < k vs. X > fc, Vfc. 

- Binary partitions: For categorical x G {l,2,...,n} = X, 
consider spHts of form: 

X e A vs. X e X - A, yAc X. 
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1. Decision Tree Splitting Rules - Continued 

• Choosing best candidate split. 

- Method 1: Choose k (continuous) or A (categorical) that 
minimizes estimated classification (regression) error after split. 

- Method 2 (for classification): Choose k or A that minimizes 
estimated entropy after that split. 



SPLIT #1 SPLIT #2 
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2. Decision Tree Stopping Rules 

When to declare a node terminal? 

• Strategy (Cost- Complexity pruning): 

1. Grow over-large tree. 

2. Form sequence of subtrees, To, ranging from full tree to 
just the root node. 

3. Estimate "honest" error rate for each subtree. 

4. Choose tree size with mininum "honest" error rate. 

• To form sequence of subtrees, vary a from (for full tree) to oo (for 
just root node) in: 

nnn [ R{T) + a | T | " . 

• To estimate "honest" error rate, test on data different from training 
data, e.g., grow tree on 9/10 of available data and test on 1/10 of data 
repeating 10 times and averaging (cross-validation). 
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End of Declarative Sentence Prediction: Pruning 

Sequence 



LO 

o 




+ 



B 
to 



o 

0) 



o 
o 



LO 

o 
o 



o 
o 



!0 OqBD 00 QDO 00 



20 



40 



60 



80 



100 



# of terminal nodes 
+ = raw, = cross-validated 
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3. Decision Tree Node Assignment 



Which class/value to assign to a terminal node? 

• Plurality vote: Choose most frequent class at that node for 
classification; choose mean value for regression. 
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End-of-Declarative-Sentence Prediction: Features [65] 

• Prob[word with occurs at end of sentence] 

• Prob[word after "." occurs at beginning of sentence] 

• Length of word with 

• Length of word after 

• Case of word with Upper, Lower, Cap, Numbers 

• Case of word after Upper, Lower, Cap, Numbers 

• Punctuation after (if any) 

• Abbreviation class of word with - e.g., month name, 
unit-of-measure, title, address name, etc. 
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End of Declarative Sentence? 




5137/5283 133/152 
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Phoneme-to-Phone Alignment 

PHONEME PHONE WORD 



p p purpose 

er er 

p pel 

P 

ax ix 

s s 

ae ax and 

n n 

d 

r r respeet 

ih ix 

s s 

P pel 

p 

eh eh 

k kcl 

t t 
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Phoneme-to-Phone Realization: Features [66, 10, 62] 

• Phonemic Context: 

- Phoneme to predict 

- Three phonemes to left 

- Three phonemes to right 

• Stress (0, 1, 2) 

• Lexical Position: 

- Phoneme count from start of word 

- Phoneme count from end of word 
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Phoneme- to-Phone Realization: Prediction Example 



Tree splits for /t/ in ^ ^your pretty red' ' : 



PHONE 


COUNT 


SPLIT 


ix 


182499 




n 


87283 


cmO: vstp,ustp,vfri,ufri,vaff,uaff,nas 


kcl+k 


38942 


cmO: vstp,ustp,vaff,uaff 


tcl+t 


21852 


cpO: alv,pal 


tcl+t 


11928 


cmO: ustp 


tcl+t 


5918 


vml: mono,rvow,wdi,ydi 


dx 


3639 


cm-1: ustp,rlio,n/a 


dx 


2454 


rstr: n/a,no 
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Phoneme- to-Phone Realization: Network Example 



Phonetic network for ^ ^Don had your pretty 



/ r . 



FHOJNrLMJi, 


FHOJNrLl 


FHOJNJiZ 


d 


0.91 d 




aa 


0.92 aa 




n 


0.98 n 




nn 


0.74 hh 


0.15 hv 


ae 


0.73 ae 


0.19 eh 


d 


0.51 dcljh 


/~v ^ 111 

0.37 del d 


y 


0.90 y 






0.84 - 


0.16 y 


uw 


0.48 axr 


0.29 er 


r 


0.99 - 




P 


0.99 pel p 




r 


0.99 r 




ih 


0.86 ih 




t 


0.73 dx 


0.11 tclt 


iy 


0.90 iy 





PHONE3 



CONTEXT 



(if d- 
(ifd- 



^dcl d) 
dcljh) 
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Acoustic Model Context Selection [92, 39] 

• Statistical regression trees used to predict contexts based on 
distribution variance 

• One tree per context-independent phone and state (left, middle, right) 

• The trees were grown until the data criterion of 500 frames per 
distribution was met 

• Trees pruned using cost-complexity pruning and cross-validation to 
select best contexts 

• About 44000 context-dependent phone models 

• About 16000 distributions 
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N-Grams: Basics 

'Chain Rule' and Joint/Conditional Probabilities: 



P[xiX2 . . . xat] = P[xn\xi...xn-i]P[xn-i\xi...xn-2] . . . P[x2|xi]P[ 



where, e.g., 



P[xn\xi . . . xn-i] = 



P[xi . . .Xn] 

P[xi . . . xn-i] 



(First-Order) Markov assumption: 



P[Xk\xi . ..Xk-l] = P[Xk\Xk-l] = 



nth-Order Markov assumption: 



P^Xj^ X\ . . . Xj^ — i^ P^Xj^ Xj^ — ji...Xj^ — i^ 



P[a;fc-ia;fc] 
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N- Grams: Maximum Likelihood Estimation 

Let N be total number of n-grams observed in a corpus and c{xi . . . x^) 
be the number of times the n-gram xi . . . occurred. Then 

y-,!- -1 C(Xl . . . X72) 

i\X\ . . . Xji\ -j^ 

is the maximum HkeHhood estimate of that n-gram probabiHty. 
For conditional probabilities, 

c ^ tie ^ ♦ ♦ ♦ tJC ^ 



-^[Xtt, X\ . . . X72 — 1] 



c(^xi . . . Xji— \ ) 

is the maximum likelihood estimate. 

With this method, an n-gram that does not occur in the corpus is assigned 
zero probability. 
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N-Grams: Good-Turing-Katz Estimation [29, 16] 

Let Ur be the number of n-grams that occurred r times. Then 



N 



is the Good-Turing estimate of that n-gram probabihty, where 

For conditional probabihties, 

pr -1 c (xi . . . x^) . \ n 

lyXji X\ . . . Xji — \^ , . , C\X\ . . . Xji j ^ yj 

C\X\ . . . Xji— \ J 

is Katz's extension of the Good-Turing estimate. 

With this method, an n-gram that does not occur in the corpus is assigned 
the backoff probabihty P[x^|xi . . .Xn-i] = aP[xn\x2 . . . where 
a is a normahzing constant. 
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Finite-State Modeling [57] 

Our view of recognition cascades: represent mappings between levels, 
observation sequences and language uniformly with weighted finite-state 
machines: 

• Probabilistic mapping P(x|y): weighted finite- state transducer. 
Example — word pronunciation transducer: 




• Language model P(w): weighted finite-state acceptor 
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Example of Recognition Cascade 



o — ^ 

observations 




phones 




words 




- Observations: 0(s, s) = 



Recognition from observations o by composition: 

1 if s = o 
otherwise 

- Acoustic-phone transducer: A(a, p) = P(a|p) 

- Pronunciation dictionary: L)(p, w) = P(p|w) 

- Language model: M(w, w) = P(w) 

Recognition: w = argmax(0 o Ao D o M)(o, w) 



w 
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Speech Models as Weighted Automata 

Quantized observations: 



to) - f t?) >. ( t2 





Phone model : observations ^ phones 



o/:8/poi(0 o/:8/pi2(0 e:n/p2f 
^0 } — — I r *{^2 





Oj:8/poo(0 Oj:8/pii(0 Oj:8/p22(0 



Acoustic transducer: A = A^j-) 



* 



Word pronunciations i^data • phones words 



ax:"data71 




Dictionary: D = (X)^ D^) 



* 
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Example: Phone Lattice O o A 

• Lattices: Weighted acyclic graphs representing possible 
interpretations of an utterance as sequences of units at a given level of 
representation (phones, syllables, words,. . . ) 

• Example: result of composing observation sequence for hostile battle 
with acoustic model: 



S/-B.579 
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Sample Pronunciation Dictionary D 



Dictionary with hostile, battle and bottle as a weighted transducer: 



a-.-a.m 
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Sample Language Model M 

Simplified language model as a weighted acceptor: 
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Recognition by Composition 

• From phones to words: compose dictionary with phone lattice to 
yield word lattice with combined acoustic and pronunciation costs: 




• Applying language model: Compose word lattice with language 
model to obtain word lattice with combined acoustic, pronunciation 
and language model costs: 
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Context-Dependency Examples 

• Context-dependent phone models: Maps from CI units to CD units. 

Example: ae/h d ae^^d 

• Context-dependent allophonic rules: Maps from baseforms to 
detailed phones. Example: tjV V dx 

• Difficulty: Cross-word contexts - where several words enter and 
leave a state in the grammar, substitution does not apply. 
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Context-Dependency Transducers 

Example — triphonic context transducer for two symbols x and y. 
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Generalized State Machines 

All of the above networks have hounded context and thus can be 
represented as generalized state machines. A generalized state machine 
M: 

• Supports these operations: 

- M. start - returns start state 

- M. final (state) - returns 1 if final, if non-final state 

- M.arcs{state) - returns transitions (ai, a2, . . . , ajy) leaving 

state, where = {ilabel^ olabel^ weighty nextstate) 

• Does not necessarily support: 

- providing the number of states 

- expanding states that have not been already discovered 
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On-Demand Composition [69, 53] 

Create generalized state machine C for composition Ao B. 

• C. start := {A. st art ^B. start) 

• C .final {{si ^ s2)) '= A. final {si) /\ B .final {s2) 

• C .arcs{{sl^ s2)) := Merge{A.arcs{sl)^ B.arcs{s2)) 

Merged arcs defined as: 

(/I, /3, X -\- {nsl^ns2)) G M er ge{A.arcs{sl) ^ B.arcs{s2)) 

iff 

(/I, /2, X, n^l) G A.arc5(5l) and {l2J3^y^ns2) G B.arcs{s2) 
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State Caching 

Create generalized state machine B for input machine A. 

• B. start := A. start 

• B. final (state) := A. final (state) 

• B .arcs(state) := A.arcs(state) 

Cache Disciphnes: 

• Expand each state of A exactly once, i.e. always save in cache 
(memoize). 

• Cache, but forget 'old' states using a least-recently used criterion. 

• Use instructions (ref counts) from user (decoder) to save and forget. 



M.Mohri-M.Riley-R.Sproat 



Algorithms for Speech Recognition and Language Processing 



PART II 



91 



On Demand Composition - Results 

ATIS Task - class-based trigram grammar, full cross-word triphonic 
context-dependency. 





states 


arcs 


context 


762 


40386 


lexicon 


3150 


4816 


grammar 


48758 


359532 


full expansion 


- 1.6 X 10^ 


5.1x - 10^ 



For the same recognition accuracy as with a static, fully expanded 
network, on-demand composition expands just 1.6% of the total number 
of arcs. 
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Determinization in Large Vocabulary Recognition 

• For large vocabularies, 'string' lexicons are very non-deterministic 

• Determinizing the lexicon solves this problem, but can introduce 
non-coassessible states during its composition with the grammar 

• Alternate Solutions: 

- Off-line compose, determinize, and minimize: 

Lexicon o Grammar 

- Pre-tabulate non-coassessible states in the composition of: 

Det{Lexicon) o Grammar 
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Search in Recognition Cascades 

Reminder: Cost = — log probability 

Example recognition problem: w = argmax(0 o Ao D o M)(o, w) 



w 



Viterbi search: approximate w by the output word sequence for the 
lowest-cost path from the start state to a final state inOoAoDoM 
— ignores summing over multiple paths with same output: 




Composition preserves acyclicity, O is acyclic ^ acyclic search 
graph 
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Single-source Shortest Path Algorithms [83] 

• Meta-algorithm: 

Q ^ {^o}\ V5, Cost{s) ^ 00 
While Q not empty, s ^Dequeue((5) 
For each 5' G such that Cost{s') > Cost{s) + cost{s^ s') 

Cost{s') ^ Cost{s) + cost{s^ s') 

Enqueue((5, s) 



• Specific algorithms: 



Name 


Queue type 


Cycles 


Neg. Weights 


Complexity 


acycHc 


topological 


no 


yes 


0{\V\ + \E\) 


Dijkstra 


best-first 


yes 


no 


0{ E logV) 


Bellman-Ford 


FIFO 


yes 


yes 


0{V ■ E ) 
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The Search Problem 

• Obvious first approach: use an appropriate single-source 
shortest-path algorithm 

• Problem: impractical to visit all states, can we do better? 

- Admissible methods: guarantee finding best path, but reorder 
search to avoid exploring provably bad regions 

- Non-admissible methods: may fail to find best path, but may need 
to explore much less of the graph 

• Current practical approaches: 

- Heuristic cost functions 

- Beam search 

- Multipass search 

- Rescoring 
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Heuristic Cost Function — A* Search [4, 56, 17] 

• States in search ordered by 

cost-so-far(5) + lower-bound-to-complete(5) 

• With a tight bound, states not on good paths are not explored 

• With a loose lower bound no better than Dijkstra's algorithm 

• Where to find a tight bound? 

- Full search of a composition of smaller automata (homomorphic 
automata with lower-bounding costs?) 

- Non-admissible A* variants: use averaged estimate of 
cost-to-complete, not a lower-bound 



M.Mohri-M.Riley-R.Sproat 



Algorithms for Speech Recognition and Language Processing 



PART II 



97 



Beam Search [35] 

• Only explore states with costs within a beam (threshold) of the cost 
of the best comparable state 

• Non-admissible 

• Comparable states = states corresponding to (approximately) the 
same observations 

• Synchronous (Viterbi) search: explore composition states in 
chronological observation order 

• Problem with synchronous beam search: too local, some observation 
subsequences are unreliable and may locally put the best overall path 
outside the beam 
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Beam-Search Tradeoffs [68] 

Word lattice: result of composing observation sequence, level 
transducers and language model. 



Beam 


Word lattice 
error rate 


Median number 
of edges 


4 


7.3% 


86.5 


6 


5.4% 


244.5 


8 


4.4% 


827 


10 


4.1% 


3520 


12 


4.0% 


13813.5 
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Multipass Search [52, 3, 68] 

• Use a succession of binary compositions instead of a single n-way 
composition — combinable with other methods 

• Prune: Use two-pass variant of composition to remove states not in 
any path close enough to the best 

• Pruned intermediate lattices are smaller, lower number of state 
pairings considered 

• Approximate: use simpler models (context-independent phone 
models, low-order language models) 

• Re score. . . 
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Rescoring 



Most successful approach in practice: 



o —i — ► 




approximate 

n best rescoring 

detailed 
models 



w 



n 



Small pruned result built by composing approximate models 
Composition with full models, observations 



Find lowest-cost path 
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PART III 

Finite State Methods in Language 

Processing 

Richard Sproat 

Speech Synthesis Research Department 
Bell Laboratories, Lucent Technologies 

rws@bell-labs . com 



Lucent T«:hrK>lDgie& 
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Overview 



• Text-analysis for Text-to-Speech (TTS) Synthesis 



- A rich domain with lots of linguistic problems 



- Probably the least familiar application of NLP technologies 



• Syntactic analysis 



• Some thoughts on text indexation 
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The Nature of the TTS Problem 



This is some text: 
It was a dark and 
stormy nigiit. Four 
score and seven 
years ago. Now is 
the time for all 
good men. Let 
them eat cake. 
Quoth the raven 
nevermore. 



1 



J 



Linguistic Analysis 



1 



phonemes, durations 
and pitch contours 




peech Synthesis 

speech waveforms 
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From Text to Linguistic Representation 
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Russian Percentages: The Problem 

How do you say '%' in Russian? 



20% CKHAKa 
'20% discount' 

c 20% pacTBopoM 

'with 20% solution ' 



Adjectival forms when modifying nouns 

ABaAiJ,aT 



npoiJ,eHTH 



CKHAKa 



dvadca^T\-procent naja 



skidka 



c ABaAiJ,aT h -npoiJ,eHT 



HbIM 



pacTBopoM 



s dvadca^T\-procent 



nym 



rastvorom 



21% 
23% 
20% 

C20% 

'with 20%' 



Nominal forms otherwise 

ABaAiJ,aTb oahh npoiJ,eHT 
dvadcaf odin procent 
ABaAiJ,aTb Tpw npoiJ,eHT 



a 



dvadcaf tri procent a 



ABaAiJ,aTb npoiJ,eHT 



OB 



dvadcaf procenT\ ov 

c ABaAiJ,aTb 

s dvadcaf 



K) 



npoiJ,eHT 



aMH 



JU 



procent ami 
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Text Analysis Problems 



Segment text into words. 



Segment text into sentences, checking for and expanding 



abbreviations 



St Louis is in Missouri. 



Expand numbers 



Lexical and morphological analysis 



Word pronunciation 



Homograph disambiguation 



Phrasing 



Accentuation 
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Desiderata for a Model of Text Analysis for TTS 



• Delay decisions until have enough information to make them 



• Possibly weight various alternatives 



Weighted Finite-State Transducers offer an attractive computational model 
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Overall Architectural Matters 

Example: word pronunciation in Russian 

• Text form: KocTpa<kostra> (bonfire+genitive. singular) 

• Morphological analysis: 
KocT'Ep{noun}{masc}{inan}+'a{sg}{gen} 

• Pronunciation: /kAStr'a/ 

• Minimal Morphologically-Motivated Annotation (MMA) 

(Sproat, 1996) 
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Overall Architectural Matters 



L = Do M 

L I 



D 
M 



Language Model 



COO 

I 

T 



Morphological Analysis 



fst 

030 



Lexical Analysis WFST: 



S L 



Phonological Analysis WFST: 
L p 



MMA 

ltKOSTP"A# 



fst a:p 

OZX) 



fst a:p 

COO 

t8 



Surface Orthographic Form 



Pronunciation 

#kastr"a# 
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Orthography Lexical Representation 

A Closer Look 



Words : Lex. Annot. o Lex. Annot. : Lex. Anal. 






Punc. :Interp. 




u 


u 


special Symbols : Expansions 




SPACE :Interp. 




u 

Numerals : Expansions 





SPACE: white space in German, Spanish, Russian . . . 
€ in Japanese, Chinese . . . 
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Chinese Word Segmentation 



T 



mm 



m 



mm. 



Tiasp4.68 

T2^lVb8.11 
^ivb5.56 

^2adV4.58 

ffivb4.45 

jC\Vbii.77 

Hvb++^2T2npOti2.23 
S^np4 88 

SiCvb8.o5 

Jjj^lVbio.70 

iRifMncii.o2 

fSncio.35 

^iJjjncio.92 

3SiC^iurnp42.23 



liao3jie3 

da4 

da4jiel 

bu4 

zai4 

wang4 

wang4+bu4liao3 

wo3 

fang4 

fang4da4 

na3li3 

jiel 

jie3fang4 
xie4fang4da4 



PERF 

understand 
big 

avenue 

not 

at 

forget 

unable to forget 
I 

place 

enlarge 

where 

avenue 

liberation 

NAME 
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Chinese Word Segmentation 



Space = e : # 



L = Space ^ {Dictionary ^ {Space U Punc))^ 



BestPath(eS^T^Sjj^®ffiPilifM o L) = 
S^pro4.88#Hvb+:f T2npoti2.23#^iS^Jncio.92:^ifSncii.45 
'I couldn't forget where Liberation Avenue is.' 
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Numeral Expansion 



234 o Factorization 



2 • 10^ + 3 • 10^ + 4 



DecadeFlop 



2 • 10^ + 4 + 3 • 10^ 



NumberLexicon || 



zwei+hundert+ vier-\- und+ dreifiig 
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Numeral Expansion 
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German Numeral Lexicon 



/{I} 

/{2} 
/{3} 

/({0}{+++}{l}{10Al}) 
/({1}{+++}{1}{10A1}) 
/({2}{+++}{l}{10Al}) 
/({3}{+++}{l}{10Al}) 

/({2}{10A1}) 
/({3}{10A1}) 

/({10A2}) 
/({10A3}) 



('eins{num}({masc}|{neut}){sg}{##})/ 

(zw'ei{num}{##})/ 

(dr'ei{num}{##})/ 

(z'ehn{num}{##})/ 
('elf{num}{##})/ 
(zw'olf{num}{##})/ 
(dr'ei{++}zehn{num}{##})/ 

(zw'an{++}zig{num} {##})/ 
(dr'ei{++}Big{num}{##})/ 

(h'undert{num} {##})/ 

(f ausend{num} {neut} {##})/ 
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Morphology: Paradigmatic Specifications 



Paradigm {Al} 

# Starke Flektion (z.B. nach unbestimmtem Artikel) 



Suffix 


{++}er 


{ sg} {masc} {nom} 


Suffix 


{++}en 


{sg} {masc}({gen} | {dat} | {acc}) 


Suffix 


{++}e 


{sg}{femi}({nom} {acc}) 


Suffix 


{++}en 


{sg}({femi} 1 {neut})({gen} | {dat}) 


Suffix 


{++}es 


{sg}{neut}({nom} {acc}) 


Suffix 


{++}e 


{pl}({nom}|{acc}) 


Suffix 


{++}er 


{pi} {gen} 


Suffix 


{++}en 


{pi} {dat} 
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Morphology: Paradigmatic Specifications 



##### Possessiva ("mein, euer") 



Par* qHi crrn 


IA61 




Suffix 




4 Q (T l r 4 TTi jiQpl-Nnpiitl-WTi nrn \ 


Suffix 

O U.111 A 




i r i IClill r i llUlll r 


Suffix 




4 c(T 4 mtiQP 114 n PI it 1^4 (TPn l 


Suffix 


|-i~i-|er 


l<ipllfpmilflppnlllrlatl'l 


Suffix 

O Ulll A 


/_i i_Xprn 


4 CO" 1-/^4 TTIQCr* 1-14 TlPllt 1-^4 Hilt t 
i f \ 1 AAAtlij^ J 1 1 llt/U-L J / 1 V_lCtL r 


Suffix 


|++|en 


fir If 1 

|sg||masc||acc| 


Suffix 


{++}{Eps} 


{sg}{neut}{acc} 


Suffix 


{++}e 


{pl}({nom}l{acc}) 


Suffix 


{++}er 


{pi} {gen} 


Suffix 


{++}en 


{pl}{dat} 
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Morphology: Paradigmatic Specifications 



/{Al} 
/{Al} 
/{Al} 
/{Al} 



/{A6} 
/{A6} 
/{A6} 
/{A6} 
/{A6} 
/{A6} 
/{A6} 



'aal{++}glatt{adj})/ 

'ab{++}ander{++}lich{adj } {umlt})/ 

'ab{++}artig{adj})/ 

' ab{++}bau{++}wiirdig{adj } {umlt})/ 



d'ein{adj})/ 

'euer{adj})/ 

'ihr{adj})/ 

'Ihr{adj})/ 

m'ein{adj})/ 

s'ein{adj})/ 

'unser{adj})/ 
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Morphology: Paradigmatic Specifications 

Project(({A6}--Endings) o (({A6}:Stems)--Id(Z*))) ^ 




Morphology: Finite-State Grammar 



START 

PREFIX 

PREFIX 



PREFIX 

STEM 

STEM 



{Bps} 
{Eps} 

t"ele{++}<1.0> 



STEM 
STEM 



SUFFIX 
SUFFIX 



abend 
'abenteuer 



SUFFIX 
SUFFIX 
SUFFIX 



PREFIX 

FUGE 

WORD 



{++}<1.0> 

{Eps}<1.0> 

{Eps}<2.0> 
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Morphology: Finite-State Grammar 



FUGE SECOND 

FUGE SECOND 

SECOND PREFIX 

SECOND STEM 

SECOND WORD 

WORD 



{++}<1.5> 
{++}s{++}<1.5> 

{Eps}<1.0> 
{Eps}<2.0> 
{Eps}<2.0> 



M.Mohri-M.Riley-R.Sproat 



Algorithms for Speech Recognition and Language Processing 



PART 111 



122 



Morphology: Finite-State Grammar 



Unanstandigkeitsunterstellung 
'allegation of indecency' 



"un{++}"an{++}st'and{++}ig{++}keit{++}s{++}unter{++}st'ell{++}ung 
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Rewrite Rule Compilation 

Context-dependent rewrite rules 

General form: 

0, ^, A, p regular expressions. 

Constraint: ^ cannot be rewritten but can be used as a context 
Example: 

a b/c b 

(Johnson, 1972; Kaplan & Kay, 1994; Karttunen, 1995; Mohri & Sproat, 
1996) 
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Example 



a b/c b 



w = cab 
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Example 



Input: 
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Example 



After replace: 
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Rewrite Rule Compilation 



• Principle 

- Based on the use of marking transducers 

- Brackets inserted only where needed 



• Efficiency 

- 3 determinizations + additional linear time work 

- Smaller number of compositions 
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Rule Compilation Method 



r o / o replace 0/^0/2 



r : 
/: 

replace : 
h : 



(EU {>})*,/.> 

<i (/) >— ><i V' 

E* A <2^ E* A 



(EU {>})*{<!, <2}</.> 
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Marking Transducers 



Proposition Let a be a deterministic automaton representing Z*/?, then 
the transducer r post-marks occurrences of (3 by #. 




Final state q with entering and leaving transitions of Id{a). 




States and transitions after modifications, transducer r. 
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Marker of Type 2 
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The Transducers as Expressions using Marker 



r = [reverse{Marker(L*reverse{p)^l^ {>}^^))] 

f = [rever5e(Marfcer((L U {>})*rever5e((/)> >), 1, {<i, <2}, 0))] 

h = [Marfcer(i:*A, 2, 0, {<i})]<,:<, 

h = [Marfcer(i:*A,3,0,{<2})] 
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Example: r for rule a h/c h 




c:c 



Marker(JL*reverse(p), 1, {>}, 0) 




reverse(Marker(L*reverse(p), 1, {>}, 0)) 




M.Mohri-M.Riley-R.Sproat 



Algorithms for Speech Recognition and Language Processing 



PART III 



133 



The Replace Transducer 



Z:Z, <2:<2, >:e 
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Extension to Weighted Rules 



Weighted context-dependent rules: 

• (t>^X^ p regular expressions, 

• formal power series on the tropical semiring 

Example: 

c (.9c) + (At) /a t 
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Rational power series 



Functions : Z* ^ 7^+ U {oo}, Rational power series 

• Tropical semiring: (7^+ U {oo}, min, +) 

• Notation: S = ^ {S,w) 

• Example: S = {2a) {3b) {4b) {5b) + (5a)(36)* 

{S.abbb) =min{2 + 3 + 4 + 5 = 14, 5 + 3 + 3 + 3 = 11} = 11 

Theorem 6 (Schiitzenberger, 1961): S is rational iff it is recognizable 
( representable by a weighted transducer). 



M.Mohri-M.Riley-R.Sproat 



Algorithms for Speech Recognition and Language Processing 



PART III 



Compilation of weighted rules 



• Extension of the composition algorithm to the weighted case 

- Efficient filter for e-transitions 

- Addition of weights of matching labels 



• Same compilation algorithm 



• Single-source shortest paths algorithms to find the best path 
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Rewrite Rules: An Example 



s ^ z / ($l#) VStop ; 



V:V 



fStop 



5 top 



#:# 



s:z 



V:V 



VStop:VStop 



VStop:VStop 



/mis$mo$/ o Voicing = /miz$mo$/ 
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Syllable structure 



(c* V ct.o $ )+ n 



(E* $ (CCn-(CGUOL))E*)n 



(E* $ ([+cor]U/x/U//?/)/l/E*)n 



(L* $ ([+cor,+strid] U/x/U//?/)/r/Z*) 



estrella: /estreya/ o Intro( $ ) o Syl 



Q **^0 ,Q e/0 ,Q s/1 




BestPath (/estreya/ o Intro( $ ) o Syl) = /es 



$ 



a^/a^: /atlas/ o Intro( 



BestPath(/atlas/ o Intro( 



)oSyl 



$/0 

1/1 




a/0 



##/0 



$ 



) o Syl) = /at 



las 



$ / 
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Russian Percentage Expansion: An example 



C 5% CKHAKOH 

o 

Lexical Analysis FST 



Sprep pj ^tnum ' nom -prOCCntnad 


j ^ +ajafem+sg+nom skidkfemOjsg+instr U 


Sprep pjatnumigen-prOCentnadj 




+C)jfem+sg+instr skidkfemOjsg+instr 2.0 


u 



Sprep pjatnum'jUinstr-prOCentnoun+amipi+instr skidkfemOjsg+instr 4.0 U 
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Language Model FSTs: 



/ procentnoun H ^#)* #_ (L H ^#) 
e / procentnadj (I^ H ^#)* #_ (£ H ^#) 



* 

noun 

* 

noun 



/ procentn (£ n ^#)*casen-instr# _ n ^#) 
/ procentn (L n -#)*sg+case# _ (^^ H 



* 

instr 




s pjatigen-procentnadjojsg+instr skidkoj 
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Percentage Expansion: Continued 



C JVC CKHAKOH 



s pjatigen-procentnadjojsg+instr skidkoj 



LoP 



s # PiT" !p~r@c"Entn&Y # sK"!tk&Y 
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Phrasing Prediction 

Problem: predict intonational phrase boundaries in long 
unpunctuated utterences: 

For his part, Clinton told reporters in Little Rock, Ark., on Wednesday 
I that the pact can be a good thing for America \\ if we change our 
economic policy \ \ to rebuild American industry here at home \ \ and if 
we get the kind of guarantees we need on environmental and labor 
standards in Mexico \ \ and a real plan \ \ to help the people who will 
be dislocated by it. 

Bell Labs synthesizer uses a CART-based predictor trained on labeled 
corpora (Wang & Hirschberg 1992). 



M.Mohri-M.Riley-R.Sproat 



Algorithms for Speech Recognition and Language Processing 



PART III 



143 



Phrasing Prediction: Variables 

For each < Wi^Wj > : 

• length of utterance; distance of Wi in syllables/ 

stressed syllables/words . . . from the beginning/end of the sentence 

• automatically predicted pitch accent for Wi and wj 

• part-of-speech (POS) for a 4- word window around < Wi^wj >; 

• (largest syntactic constituent dominating Wi but not wj and vice 
versa, and smallest constituent dominating them both) 

• whether < Wi^wj > is dominated by an NP and, if so, distance of 
Wi from the beginning of that NP, the NP, and distance/length 

• (mutual information scores for a four- word window around 

< Wi.Wj >) 

The most successful of these predictors so far appear to be POS, some 
constituency information, and mutual information 
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Phrasing Prediction: Sample Tree 
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Phrasing Prediction: Results 

Results for multi- speaker read speech: 

- major boundaries only: 91.2% 

- collapsed major/minor phrases: 88.4% 

- 3-way distinction between major, minor and null boundary: 
81.9% 

Results for spontaneous speech: 

- major boundaries only: 88.2% 

- collapsed major/minor phrases: 84.4% 

- 3-way distinction between major, minor and null boundary: 
78.9% 

Results for 85K words of hand-annotated text, cross-validated on 
training data: 95.4%. 
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Tree-Based Modeling: Prosodic Phrase Prediction 
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The Tree Compilation Algorithm 

(Sproat & Riley, 1996) 

• Each leaf nodt corresponds to single rule defining a constrained weighted 
mapping for the input symbol associated with the tree 

• Decisions at each node are stateable as regular expressions restricting the left 
or right context of the rule(s) dominated by the branch 

• The full left/right context of the rule at a leaf node are derived by intersecting 
the expressions traversed between the root and leaf node 

• The transducer for the entire tree represents the conjunction of all the 
constraints expressed at the leaf nodes; it is derived by intersecting together 
the set of WFSTs corresponding to each of the leaves 

- Note that intersection is defined for transducers that express same-length 
relations 

• The alphabet is defined to be an alphabet of all correspondence pairs that 
were determined empirically to be possible 
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Interpretation of Tree as a Ruleset 



Node 16 


A 








1 


U Iuj#uj U Iuj#uj#uj)) 


n 
n 


3 NUA 


2 


{Z'^iN UV U AU Adv U D)) 




4 


(Z*(/cjU/cj#cj)) 




# ^ (I, OQ U #0 41 ) / I(u;#)7(N UVuAuAdvUD) NuA 
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Summary of Compilation Algorithm 

Each rule represents a weighted two-level surface coercion rule 



RuIcl = Compile{(f)T ^ ^l/ Pp) 

Each tree/forest represents a set of simultaneous weighted two-level 
surface coercion rules 



RuIct = 1^ RuIcl 



Rulep = 1^ RuIct 



BestPath(,D#N#V#Adv#D#A#N o Tree) ^ ,D#N#V#AdvmD#A#N2.76 
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Lexical Ambiguity Resolution 

Word sense disambiguation: 
She handed down a harsh sentence, peine 
This sentence is ungrammatical. phrase 



Homograph disambiguation 



He plays bass. /he's/ 
This lake contains a lot of bass, /baes/ 

• Diacritic restoration: 

appeler 1' autre cote de I'atlantique cote 'side' 
Cote d' Azur cote 'coast' 

(Yarowsky, 1992; Yarowsky 1996; Sproat, Hirschberg & Yarowsky, 1992; 
Hearst 1991) 
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Homograph Disambiguation 1 



• N-Grams 



Evidence 


led 


lid 


Logprob 


lead level/N 


219 





11.10 


o/lead in 


162 





10.66 


the lead in 





301 


10.59 


lead poisoning 


110 





10.16 


lead role 





285 


10.51 


narrow lead 





70 


8.49 



• Predicate- Argument Relationships 





follow/V + lead 





527 


11.40 




take/V + lead 


1 


665 


7.76 


• Wide Context 












zinc ^ lead 


235 





11.20 




copper ^ lead 


130 





10.35 



• other Features (e.g. Capitalization) 
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Homograph Disambiguation 2 



Sort bv Abo(LoQ( ^^(^^Q^il^^^^^Q^^^^Q^O 

J \ Pr{Pron2\C ollocatioTii) ^ ^ 





Decision List for lead 


Logprob 




Proniinriati on 


11 40 




^ lid 

— r lid 


11 20 


Yinc lead 


^ kd 


11.10 


lead IpvpI/N 


^ kd 


10 66 


L// XwdVI- Lit/ 


^ kd 


10 59 


the lead m 


^lid 


10.51 


lead role 


^lid 


10.35 


copper ^ lead 




10.28 


lead time 


^lid 


10.16 


lead poisoning 


^kd 
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Homograph Disambiguation 3: Pruning 

• Redundancy by subsumption 



Evidence 


lid 


led 


Logprob 


lead level/N 


219 





11.10 


lead levels 


167 





10.66 


lead level 


52 





8.93 



• Redundancy by association 



Evidence 


t£9^ 


tl9^ 


tear gas 





1671 


tear ^ police 





286 


tear ^ riot 





78 


tear ^ protesters 





71 
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Homograph Disambiguation 4: Use 

Choose single best piece of matching evidence. 





Decision List for lead 


Logprob 


Evidence 


Pronunciation 


11.40 


followN + lead 


^lid 


11.20 


zinc ^ lead 


^kd 


11.10 


lead level/N 


^kd 


10.66 


o/lead in 


^kd 


10.59 


the lead in 


^lid 


10.51 


lead role 


^lid 


10.35 


copper ^ lead 


^kd 


10.28 


lead time 


^lid 


10.16 


lead poisoning 


^kd 
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Homograph Disambiguation: Evaluation 



Word 


Pronl 


Pron2 


Sample Size 


Prior 


Performance 


lives 


laivz 


livz 


33186 


.69 


.98 


wound 


waund 


wund 


4483 


.55 


.98 


Nice 


nais 


nis 


573 


.56 


.94 


Begin 


bl'gin 


beigin 


1143 


.75 


.97 


Chi 


tji 


kai 


1288 


.53 


.98 


Colon 


kou'loun 


'koul9n 


1984 


.69 


.98 


lead (N) 


lid 


led 


12165 


.66 


.98 


tear (N) 


t£9^ 


tI9^ 


2271 


.88 


.97 


axes (N) 


'aeksiz 


'aeksiz 


1344 


.72 


.96 


IV 


ai vi 


fojO 


1442 


.76 


.98 


Jan 


d3aen 


jan 


1327 


.90 


.98 


routed 


lutid 


Jautid 


589 


.60 


.94 


bass 


bels 


b^s 


1865 


.57 


.99 


TOTAL 






63660 


.67 


.97 
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Decision Lists: Summary 



• Efficient and flexible use of data. 



• Easy to interpret and modify. 
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Decision Lists as WFSTs 

The lead example 



• Construct 'homograph taggers' Hq, H\ . . . that find and tag instances 
of a homograph set in a lexical analysis. For example, H\ is: 
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Decision Lists as WFSTs 



• Construct an environmental classifier consisting of a pair of transducers Ci and C2, 
where 

- Ci optionally rewrites any symbol except the word boundary or the homograph tags 
HO, HI . . . , as a single dummy symbol A 

- C2 classifies contextual evidence from the decision list according to its type, and 
assigns a cost equal to the position of the evidence in the list; and otherwise passes 
A, word boundary and HO, HI . . . through: 



## follow vb ## 


##AV0##<1> 


## zinc nn ## 


##AC1##<2> 


## level(s?) nn ## - 


##AR1##<3> 


##ofpp## 


##A[1##<2> 


## in pp ## 


##A1]##<2> 
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Decision Lists as WFSTs 



• Construct a disambiguator D from a set of optional rules of the form: 



HO ^ O / V0£*_ 

HI ^ O / cii:*_ 

HI ^ o / _i:*ci 

HO ^ O / _ ## A* RO 

HI ^ O / _ ## A* Rl 

HO ^ O / [0 ## A* _ ## A* 0] 

HI ^ O / [1##A* _##A* 1] 

HO O < 20 > 

HI O < 40 > 



• Construct di filter F that removes all paths containing HO, HI . . . . 
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Decision Lists as WFSTs 



• Let an example input T be: 



• Then the disambiguated input T' is given by: 



T n Project-^[ BestPath [T o Hq o Hi o Ci o C2 o D o F ]] 
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Syntactic Parsing and Analysis 



• Intersection grammars (Voutilainen, 1994, inter alia) 



• FST simulation of top-down parsing (Roche, 1996) 



• Local grammars implemented as failure function automata (Mohri, 
1994) 
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Intersection Grammars 



Text automaton consisting of all possible lexical analyses of the 
input, including analysis of boundaries. 

— . the ^ dt — — ^ cans ^ — ^ — v. hold ^ vb ^ pi ^ — ^= — ^ tuna ^ nn ^ sg 

@@J3 "^i v-PLJ @® — cTeHffi'? ® © — ^ 
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• Series of syntactic FSAs to be intersected with the text automaton, 
constraining it. 







tuna 


dt/ 


""^^ 


X / hold 





@@ 


,^;:rt\factory 


\ cans 



®® g^Jhe^^^dL^g^^^^^j^ans^^^^j^^ ^ 



-0 



Experimental grammars with a couple of hundred rules have been 
constructed. 
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Top Down Parsing 

S = [S the cans hold tuna S] 

^dic — 



cans: cans 




S o Tdic = (S [NP NP] [VP the cans hold tuna VP] S) 

(S [NP the NP] [VP cans hold tuna VP] S) 
(S [NP the cans NP] [VP hold tuna VP] S) 
(S [NP the cans hold NP] [VP tuna VP] S) 
(S [NP the cans hold tuna NP] [VP VP] S) 

S o Tdic o Tdic = (S (NP the cans NP) (VP hold [NP tuna NP] VP) S) 
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Local Grammars 



• Descriptions of local syntactic phenomena, compiled into efficient, compact 

deterministic automata, using failure functions. (Cf. the use of failure functions with 
(sets of) strings familiar from string matching — e.g. Crochemore & Rytter, 1994) 



• Descriptions may be negative or positive. 
Example of a negative constraint: 

- Let L{G) = DT @ WORD VB 

- Construct deterministic automaton for Z*L(G') 



dt 




- Given a sentence L(S), compute L(S) - i:*L(G)I.* 
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Indexation of natural language texts 

Motivation 

- Use of linguistic knowledge in indexation 

- Optimal complexities 

* Preprocessing of a text t, 0{\t\) 

* Search for positions of a string x, 

0{\x\ + NumOccurrences{x)) 

Existing efficient indexation algorithms (PAT), but not convenient 
(use with large linguistic information) 
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Example (1) 




Figure 27: Indexation with subsequential transducers t = aabba. 
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Example (2) 



{2} 




{3,4} 

Figure 28: Indexation with automata t = aabba. 
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Algorithms 

• Based on the definition of an equivalence relation on L* : 

{wi R W2) iff wi and W2 have the same set of ending positions in t 

• Construction 

- Minimal machines (subsequential transducer or automaton) 

- Use of a failure function to distinguish equivalence classes 

• Can be adapted to natural language text 
(not storing list of positions of short words) 
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Indexation with finite-state machines 

Complexity 

- Transducers (Crochemore, 1986): Preprocessing 0(|t|), Search 
0{\x\ + NumOccurrences{x)) if using complex labels 

- Automata (Mohri, 1996b): Preprocessing quadratic, Search 

0{\x\ + NumOccurrences{x)) 

Advantage: use of linguistic information 

- Extended search: composition with morphological transducer 

- Refinement: composition with finite-state grammar 

Applications to WWW (Internet) 
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